Skip to content

Remove empty values node after predicate-expression based pruning#16019

Merged
phd3 merged 1 commit intotrinodb:masterfrom
phd3:remove-empty-values
Mar 6, 2023
Merged

Remove empty values node after predicate-expression based pruning#16019
phd3 merged 1 commit intotrinodb:masterfrom
phd3:remove-empty-values

Conversation

@phd3
Copy link
Copy Markdown
Member

@phd3 phd3 commented Feb 7, 2023

Description

PushPredicateIntoTableScan rule invokes pushFilterIntoTableScan with pruneWithPredicateExpression=false. AddExchanges invokes this with pushFilterIntoTableScan=true, seems like the reason being that we've more control over the # of times such a potentially expensive optimization is invoked through a visitor based optimizer.

Because of this - there can be cases where predicate-expression based pruning only happens during AddExchanges - leaving us with empty values node at this stage. And if that ends up on the left side of a replicated join - we end up in a situation where a single task is doing a join with LHS being local empty values node and RHS being a remote scan.

This PR adds an invocation of pushFilterIntoTableScan with pruneWithPredicateExpression=true before AddExchanges happens, giving us an opportunity to prune the empty values node - yielding a much more efficient plan.

For the test added in this PR:

With the optimization:

Output[columnNames = [a]]
│   Layout: [a_3:varchar]
│   a := a_3
└─ InnerJoin[criteria = ("a_3" = "a_6"), hash = [$hashvalue, $hashvalue_9]]
   │   Layout: [a_3:varchar]
   │   dynamicFilterAssignments = {a_6 -> #df_648}
   ├─ ScanFilterProject[table = test:io.trino.connector.MockConnectorTableHandle@5ebb65cd, filterPredicate = (substring("c_5", BIGINT '1', BIGINT '10') = VARCHAR 'Y'), dynamicFilters = {"a_3" = #df_648}]
   │      Layout: [a_3:varchar, $hashvalue:bigint]
   │      $hashvalue := combine_hash(bigint '0', COALESCE("$operator$hash_code"("a_3"), 0))
   │      a_3 := MockConnectorColumnHandle{name=a, type=varchar}
   │      c_5 := MockConnectorColumnHandle{name=c, type=varchar}
   │          :: [[Y]]
   └─ LocalExchange[partitioning = HASH, hashColumn = [$hashvalue_9], arguments = ["a_6"]]
      │   Layout: [a_6:varchar, $hashvalue_9:bigint]
      └─ ScanProject[table = test:io.trino.connector.MockConnectorTableHandle@f087706d]
             Layout: [a_6:varchar, $hashvalue_10:bigint]
             $hashvalue_10 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("a_6"), 0))
             a_6 := MockConnectorColumnHandle{name=a, type=varchar}

Without the optimization: (with broadcast join enabled and forceSingleNode=false)

Output[columnNames = [a]]
│   Layout: [a:varchar]
└─ InnerJoin[criteria = ("a" = "a_6"), hash = [$hashvalue, $hashvalue_13], distribution = REPLICATED]
   │   Layout: [a:varchar]
   │   Distribution: REPLICATED
   │   dynamicFilterAssignments = {a_6 -> #df_686}
   ├─ LocalExchange[partitioning = ROUND_ROBIN]
   │  │   Layout: [a:varchar, $hashvalue:bigint]
   │  ├─ Project[]
   │  │  │   Layout: [a_0:varchar, $hashvalue_10:bigint]
   │  │  │   $hashvalue_10 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("a_0"), 0))
   │  │  └─ Values[]
   │  │         Layout: [a_0:varchar]
   │  └─ RemoteExchange[type = GATHER]
   │     │   Layout: [a_3:varchar, $hashvalue_11:bigint]
   │     └─ ScanFilterProject[table = test:io.trino.connector.MockConnectorTableHandle@5ebb65cd, filterPredicate = (substring("c_5", BIGINT '1', BIGINT '10') = VARCHAR 'Y'), dynamicFilters = {"a_3" = #df_686}]
   │            Layout: [a_3:varchar, $hashvalue_12:bigint]
   │            $hashvalue_12 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("a_3"), 0))
   │            a_3 := MockConnectorColumnHandle{name=a, type=varchar}
   │            c_5 := MockConnectorColumnHandle{name=c, type=varchar}
   │                :: [[Y]]
   └─ LocalExchange[partitioning = HASH, hashColumn = [$hashvalue_13], arguments = ["a_6"]]
      │   Layout: [a_6:varchar, $hashvalue_13:bigint]
      └─ RemoteExchange[type = GATHER]
         │   Layout: [a_6:varchar, $hashvalue_14:bigint]
         └─ ScanProject[table = test:io.trino.connector.MockConnectorTableHandle@f087706d]
                Layout: [a_6:varchar, $hashvalue_15:bigint]
                $hashvalue_15 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("a_6"), 0))
                a_6 := MockConnectorColumnHandle{name=a, type=varchar}

Additional context and related issues

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# General
* Improve execution for some query shapes involving unions by pushing down predicates more aggressively. ({issue}`16019`)

@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Feb 7, 2023

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Pratham Desai.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@phd3 phd3 marked this pull request as draft February 7, 2023 19:59
@phd3 phd3 force-pushed the remove-empty-values branch from cae105d to 4d9930a Compare February 7, 2023 21:10
@cla-bot cla-bot bot added the cla-signed label Feb 7, 2023
@phd3 phd3 requested a review from martint February 21, 2023 17:21
* The difference between this optimizer and {@link PushPredicateIntoTableScan} rule is that we
* invoke {@link PushPredicateIntoTableScan#pushFilterIntoTableScan} here with pruneWithPredicateExpression=true.
*/
public class PushFilterIntoTableScanWithPredicatePruning
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't have to have a separate optimizer for this. It can be done by parameterizing the PushPredicateIntoTableScan to optionally prune with predicate expression.

Also, it should be implemented as a Rule, not as a PlanOptimizer -- we're trying to deprecate visitor based optimizers in favor of rules wherever possible.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martint added another invocation of PushPredicateIntoTableScan iterative optimizer

@phd3 phd3 force-pushed the remove-empty-values branch from 4d9930a to a376384 Compare February 23, 2023 23:52
@github-actions github-actions bot added jdbc Relates to Trino JDBC driver tests:hive labels Feb 24, 2023
@phd3 phd3 force-pushed the remove-empty-values branch from a376384 to 373241c Compare February 25, 2023 00:42
@phd3 phd3 force-pushed the remove-empty-values branch from 373241c to 35c94cb Compare February 27, 2023 14:45
@phd3 phd3 requested a review from martint February 27, 2023 19:25
@phd3 phd3 marked this pull request as ready for review February 28, 2023 01:22
@phd3 phd3 merged commit 9d134f8 into trinodb:master Mar 6, 2023
@github-actions github-actions bot added this to the 410 milestone Mar 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed jdbc Relates to Trino JDBC driver

Development

Successfully merging this pull request may close these issues.

3 participants